Oversight Assistants: Turning Compute into Understanding
Currently, we primarily oversee AI with human supervision and human-run experiments, possibly augmented by off-the-shelf AI assistants like ChatGPT or Claude. At training time, we run RLHF, where humans (and/or chat assistants)
Analyzing long agent transcripts (Docent)
This is a brief overview of a recent release by Transluce. You can see the full write-up on the Transluce website.
AI systems are increasingly being used as agents: scaffolded systems in which
Introducing Transluce — A Letter from the Founders
We are launching an independent research lab that builds open, scalable technology for understanding AI systems and steering them in the public interest.
Transluce means to shine light through something to reveal its
Augmenting Statistical Models with Natural Language Parameters
This is a guest post by my student Ruiqi Zhong, who has some very exciting work defining new families of statistical models that can take natural language explanations as parameters. The motivation is
Analyzing the Historical Rate of Catastrophes
To communicate risks, we often turn to stories. Nuclear weapons conjure stories of mutually assured destruction, briefcases with red buttons, and nuclear winter. Climate change conjures stories of extreme weather, cities overtaken by